90 research outputs found
Graphical Data Mining for Computational Estimation in Materials Science Applications
In domains such as Materials Science experimental results are often plotted as two-dimensional graphs of a dependent versus an independent variable to aid visual analysis. Performing laboratory experiments with specified input conditions and plotting such graphs consumes significant time and resources motivating the need for computational estimation. The goals are to estimate the graph obtained in an experiment given its input conditions, and to estimate the conditions needed to obtain a desired graph. State-of-the-art estimation approaches are not found suitable for targeted applications. In this dissertation, an estimation approach called AutoDomainMine is proposed. In AutoDomainMine, graphs from existing experiments are clustered and decision tree classification is used to learn the conditions characterizing these clusters in order to build a representative pair of input conditions and graph per cluster. This forms knowledge discovered from existing experiments. Given the conditions of a new experiment, the relevant decision tree path is traced to estimate its cluster. The representative graph of that cluster is the estimated graph. Alternatively, given a desired graph, the closest matching representative graph is found. The conditions of the corresponding representative pair are the estimated conditions. One sub-problem of this dissertation is preserving semantics of graphs during clustering. This is addressed through our proposed technique, LearnMet, for learning domain-specific distance metrics for graphs by iteratively comparing actual and predicted clusters over a training set using a guessed initial metric in any fixed clustering algorithm and refining it until error between actual and predicted clusters is minimal or below a given threshold. Another sub-problem is capturing the relevant details of each cluster through its representative yet conveying concise information. This is addressed by our proposed methodology, DesRept, for designing semantics-preserving cluster representatives by capturing various levels of detail in the cluster taking into account ease of interpretation and information loss based on the interests of targeted users. The tool developed using AutoDomainMine is rigorously evaluated with real data in the Heat Treating domain that motivated this dissertation. Formal user surveys comparing the estimation with the laboratory experiments indicate that AutoDomainMine provides satisfactory estimation
The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective
The World Wide Web no longer consists just of HTML pages. Our work sheds
light on a number of trends on the Internet that go beyond simple Web pages.
The hidden Web provides a wealth of data in semi-structured form, accessible
through Web forms and Web services. These services, as well as numerous other
applications on the Web, commonly use XML, the eXtensible Markup Language. XML
has become the lingua franca of the Internet that allows customized markups to
be defined for specific domains. On top of XML, the Semantic Web grows as a
common structured data source. In this work, we first explain each of these
developments in detail. Using real-world examples from scientific domains of
great interest today, we then demonstrate how these new developments can assist
the managing, harvesting, and organization of data on the Web. On the way, we
also illustrate the current research avenues in these domains. We believe that
this effort would help bridge multiple database tracks, thereby attracting
researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011
Machine Learning Approaches in Agile Manufacturing with Recycled Materials for Sustainability
It is important to develop sustainable processes in materials science and
manufacturing that are environmentally friendly. AI can play a significant role
in decision support here as evident from our earlier research leading to tools
developed using our proposed machine learning based approaches. Such tools
served the purpose of computational estimation and expert systems. This
research addresses environmental sustainability in materials science via
decision support in agile manufacturing using recycled and reclaimed materials.
It is a safe and responsible way to turn a specific waste stream to value-added
products. We propose to use data-driven methods in AI by applying machine
learning models for predictive analysis to guide decision support in
manufacturing. This includes harnessing artificial neural networks to study
parameters affecting heat treatment of materials and impacts on their
properties; deep learning via advances such as convolutional neural networks to
explore grain size detection; and other classifiers such as Random Forests to
analyze phrase fraction detection. Results with all these methods seem
promising to embark on further work, e.g. ANN yields accuracy around 90\% for
predicting micro-structure development as per quench tempering, a heat
treatment process. Future work entails several challenges: investigating
various computer vision models (VGG, ResNet etc.) to find optimal accuracy,
efficiency and robustness adequate for sustainable processes; creating
domain-specific tools using machine learning for decision support in agile
manufacturing; and assessing impacts on sustainability with metrics
incorporating the appropriate use of recycled materials as well as the
effectiveness of developed products. Our work makes impacts on green technology
for smart manufacturing, and is motivated by related work in the highly
interesting realm of AI for materials science
Optical Character Recognition and Transcription of Berber Signs from Images in a Low-Resource Language Amazigh
The Berber, or Amazigh language family is a low-resource North African
vernacular language spoken by the indigenous Berber ethnic group. It has its
own unique alphabet called Tifinagh used across Berber communities in Morocco,
Algeria, and others. The Afroasiatic language Berber is spoken by 14 million
people, yet lacks adequate representation in education, research, web
applications etc. For instance, there is no option of translation to or from
Amazigh / Berber on Google Translate, which hosts over 100 languages today.
Consequently, we do not find specialized educational apps, L2 (2nd language
learner) acquisition, automated language translation, and remote-access
facilities enabled in Berber. Motivated by this background, we propose a
supervised approach called DaToBS for Detection and Transcription of Berber
Signs. The DaToBS approach entails the automatic recognition and transcription
of Tifinagh characters from signs in photographs of natural environments. This
is achieved by self-creating a corpus of 1862 pre-processed character images;
curating the corpus with human-guided annotation; and feeding it into an OCR
model via the deployment of CNN for deep learning based on computer vision
models. We deploy computer vision modeling (rather than language models)
because there are pictorial symbols in this alphabet, this deployment being a
novel aspect of our work. The DaToBS experimentation and analyses yield over 92
percent accuracy in our research. To the best of our knowledge, ours is among
the first few works in the automated transcription of Berber signs from
roadside images with deep learning, yielding high accuracy. This can pave the
way for developing pedagogical applications in the Berber language, thereby
addressing an important goal of outreach to underrepresented communities via AI
in education
Emerging multidisciplinary research across database management systems
The database community is exploring more and more multidisciplinary avenues:
Data semantics overlaps with ontology management; reasoning tasks venture into
the domain of artificial intelligence; and data stream management and
information retrieval shake hands, e.g., when processing Web click-streams.
These new research avenues become evident, for example, in the topics that
doctoral students choose for their dissertations. This paper surveys the
emerging multidisciplinary research by doctoral students in database systems
and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D.
workshop at the International Conference on Information and Knowledge
Management (CIKM). The topics addressed include ontology development, data
streams, natural language processing, medical databases, green energy, cloud
computing, and exploratory search. In addition to core ideas from the workshop,
we list some open research questions in these multidisciplinary areas
Emerging multidisciplinary research across database management systems
The database community is exploring more and more multidisciplinary avenues:
Data semantics overlaps with ontology management; reasoning tasks venture into
the domain of artificial intelligence; and data stream management and
information retrieval shake hands, e.g., when processing Web click-streams.
These new research avenues become evident, for example, in the topics that
doctoral students choose for their dissertations. This paper surveys the
emerging multidisciplinary research by doctoral students in database systems
and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D.
workshop at the International Conference on Information and Knowledge
Management (CIKM). The topics addressed include ontology development, data
streams, natural language processing, medical databases, green energy, cloud
computing, and exploratory search. In addition to core ideas from the workshop,
we list some open research questions in these multidisciplinary areas
Hey Dona! Can you help me with student course registration?
In this paper, we present a demo of an intelligent personal agent called Hey
Dona (or just Dona) with virtual voice assistance in student course
registration. It is a deployed project in the theme of AI for education. In
this digital age with a myriad of smart devices, users often delegate tasks to
agents. While pointing and clicking supersedes the erstwhile command-typing,
modern devices allow users to speak commands for agents to execute tasks,
enhancing speed and convenience. In line with this progress, Dona is an
intelligent agent catering to student needs by automated, voice-operated course
registration, spanning a multitude of accents, entailing task planning
optimization, with some language translation as needed. Dona accepts voice
input by microphone (Bluetooth, wired microphone), converts human voice to
computer understandable language, performs query processing as per user
commands, connects with the Web to search for answers, models task
dependencies, imbibes quality control, and conveys output by speaking to users
as well as displaying text, thus enabling human-AI interaction by speech cum
text. It is meant to work seamlessly on desktops, smartphones etc. and in
indoor as well as outdoor settings. To the best of our knowledge, Dona is among
the first of its kind as an intelligent personal agent for voice assistance in
student course registration. Due to its ubiquitous access for educational
needs, Dona directly impacts AI for education. It makes a broader impact on
smart city characteristics of smart living and smart people due to its
contributions to providing benefits for new ways of living and assisting 21st
century education, respectively
Early Identification of Implicit Requirements with the COTIR Approach using Common Sense, Ontology and Text Mining
The ability of a system to meet its requirements is a strong determinant of success. Thus effective Software Requirements Specification (SRS) is crucial. Explicit Requirements are well-defined needs for a system to execute. IMplicit Requirements (IMRs) are assumed needs that a system is expected to fulfill though not elicited during requirements gathering. Studies have shown that a major factor in the failure of software systems is the presence of unhandled IMRs. Since relevance of IMRs is important for efficient system functionality, there are methods developed to aid the identification and management of IMRs. In this research, we emphasize that commonsense knowledge, in the field of Knowledge Representation in AI, would be useful to automatically identify and manage IMRs. This research is aimed at identifying the sources of IMRs and also proposing an automated support tool for managing IMRs within an organizational context. Since this is found to be a present gap in practice, our work makes a contribution here. We propose a novel approach called COTIR (Commonsense, Ontology and Text mining for Implicit Requirements) to identify and manage IMRs. As the name implies, COTIR is based on an integrated framework of three core technologies: commonsense knowledge (CSK), text mining and ontology. We claim that discovery and handling of unknown and non-elicited requirements would reduce risks and costs in software development
- …